Natural emergence of clusters and bursts in network evolution 
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Abstract 

Network models with preferential attachment, where new nodes are injected into the network and 
form links with existing nodes proportional to their current connectivity, have been well studied 
for some time. Extensions have been introduced where nodes attach proportional to arbitrary 
fitness functions. However, in these models attaching to a node increases the ability of that node 
to gain more links in the future. We study network growth where nodes attach proportional to 
the clustering coefficients, or local densities of triangles, of existing nodes. Attaching to a node 
typically lowers its clustering coefficient, in contrast to preferential attachment or rich-get-richer 
models. This simple modification naturally leads to a variety of rich phenomena, including non- 
poissonian bursty dynamics, community formation, aging and renewal. This shows that complex 
network structure can be modeled without artificially imposing multiple dynamical mechanisms. 



INTRODUCTION 



Growing network models have been introduced to study the topological evolution of 
systems such as citations between scientific articles [IH1] , protein interactions in various or- 
ganisms [51 E], the world wide web [7], and more (SUH]- Meanwhile, recent interest has been 
drawn towards understanding not simply the topology of these systems, how the individual 
system elements interact, but also the temporal nature of these interactions [10J. For exam- 
ple, studies of the burstiness of human dynamics [TJ] IT2] . whether by letter writing [13] or 
mobile phone usage [H] , have advanced our knowledge of how information spreads [ToTTlT] 
through systems mediated by such dynamics [HJ HH [19] . 

The most popular mechanism to model growing networks remains Preferential Attach- 
ment (PA) [7J [20] • The original PA model starts from a small seed network that grows 
by injecting nodes one at a time, and each newly injected node connects to mo existing 
nodes. Each existing node i is chosen randomly from the current network with a probability 
proportional to its degree: Ppa (i) = K/ ^) ■ kj, where ki is the degree, or number of neigh- 
bors, of node i. This "rich-get-richer" mechanism leads to scale-free degree distributions, 
P (k) ~ k~( l+a \ where the earliest nodes will, over time, emerge as the wealthiest hubs in 
the network, accruing far more links than those nodes injected at later times. This strong 
early-mover advantage is one of the most striking features of PA. 

Yet PA fails to account for a number of factors: (i) It is a pure growth model whereas 
many evolving networks are equilibrated in size; (ii) It is unclear what happens when node 
attachments depend on higher order structures in a network, such as features of the neighbor- 
hood of a node; (iii) It fails to exhibit correlated network structures such as dense, modular 
clusters of nodes known as communities [21] and has pathologically low clustering [22] (very 
few triangles are formed between nodes) compared with real networks; (iv) It has pure pos- 
itive feedback giving rise to its strong rich-get-richer effects, yet real systems must possess 
"aging" where nodes lose their ability to gain links over time [I]; (v) Its early-mover advan- 
tage is appropriate primarily for systems where the origin of time is meaningful. To account 
for some of these concerns, generalized fitness variables [H [23] and temporal correlations [21] 
have been externally imposed onto the original PA model. Such extensions remain a popular 
area of research. 

We study a general question: what happens if attachments occur based on a node's 

2 



neighborhood? We introduce a network growth and evolution model based on attachment 
probabilities that are proportional to the connectivity in the neighborhood of the node. 
Surprisingly, this simple model addresses many of PA's limitations by exhibiting emergent 
aging and temporally correlated dynamics. The model naturally possesses negative feedback 
in the attachment fitnesses of existing nodes. This negative feedback mechanism, well studied 
in many areas such as neuroscience and dynamical systems, has been under-explored in the 
area of network growth and evolution. Numerical investigations supported by theory show 
that these effects are controlled entirely by attachment alone — no additional, artificially- 
imposed "rules" are necessary. 



We adapt the original preferential attachment network growth model in the following 
way. Instead of attaching to an existing node i with probability proportional to its degree 
hi, we attach proportional to its clustering coefficient (Clustering Attachment, CA) 



is the clustering coefficient of node i, Aj is the number of links between neighbors of % 
or equivalently the number of triangles involving node i, e is a constant probability for 
attachment which may be zero, and the exponent a is a parameter in our model. Other 
aspects of network growth remain the same. (We assume each new node attaches to m = 2 
existing nodes throughout; the features are the same for mo > 2 but calculations become 
more cumbersome.) We investigate both growing and fixed-size evolving networks. For the 
latter a random node is removed at the same time a new node is added. 

For the original PA mechanism the only possible "reaction" upon attaching to % is to 
increment its degree, i.e., fcj — >■ ki + 1. For CA, however, two reactions are possible: (ki — > 
ki + 1, Aj — >■ Aj) or (fcj — > ki + 1 , A j — > A j + 1 ) . While the degree always grows, the number 
of triangles Aj around % depends on whether a neighbor of % also receives a new link. 

These two reactions lead to the following potential changes in the clustering coefficient 
of the existing node before and after the attachment: 



MODEL 



Pca (i) oc c" + e, where q 



2Aj 



(1) 



ki(ki 1) 
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Here 5^Ci is the change due to connecting to i and a neighbor of i, while is the change 

due to connecting to i and a non- neighbor of i. Even when a new triangle is formed, the 
clustering coefficient after an attachment is almost always less than it was before: an increase 
in c after a new node's attachment is only possible if the existing node has degree k < 1/c. 
This means that, in contrast to PA, the CA mechanism does not feature rich-get-richer 
effects. Instead attaching to a node % drives down i's probability for further attachments. 
Forming new links based on the clustering coefficient provides a particularly simple model 
of such negative feedback or preferential inhibition. 

Yet temporal effects play a role here as well, with the temporal sequence of node injections 
determining what happens to subsequent nodes. For example, suppose a new node is injected 
and happens to form a triangle. This will give that new node maximum c; it may become 
a hot spot for future attachments. In Fig. [IJi we draw a single realization of the CA model 
with iV = 1000 nodes and a = 2. Qualitatively, we observe that CA dynamics naturally 
gives rise to community structure |21j, where the hot spot formed the seed for a new dense 
group to grow. These communities tend to form sequentially: a hot spot forms then many 
nodes attach to it, driving its attractiveness down until the next seed forms. This emerges 
naturally from the attachment mechanism, nothing has been artificially imposed. 

We quantify the evolution of these communities by running a community detection 
method [25] as a network grows according to CA. Figure [T]d depicts the optimized mod- 
ularity Q of the communities found by the method. Higher values of Q indicate "better" 
communities [2B] (although raw values of Q should be interpreted with caution [27J EH]), with 
Q = 1 being the maximum value possible. Modularity grows rapidly and then saturates, 
afterwards oscillating around this saturation value. This occurs due to the sequential growth 
and decay of communities: a dense community forms, boosting Q, then it becomes sparser 
as more nodes attach to the dense community, lowering Q until a new community forms and 
the process repeats. These oscillations appear more pronounced at larger values of a. After 
growing the network for 1000 timesteps, we then fix its size by removing a random node 
after the injection of each new node. The core effects of community emergence also exist 
in the equilibrated system, and is not a transient effect. The distributions of Q at random 
times (Fig. [Tj^) and the average value of Q after 10k timesteps of evolution (Fig. [T|i) both 
show that these networks become more modular as a increases. 

Note that this robustness of the dynamics is substantially different from the key qualities 

4 




FIG. 1. (Color online) Network growth according to clustering, (a) A realization of clustering 
attachment (CA; a = 2). Node size is proportional to clustering and node color represents the age 
of the node (time since it was injected). Communities emerge approximately sequentially in time, 
(b) Running a community detection algorithm [25] while a network grows according to CA, we 
observe the rapid appearance of modular structure (according to modularity Q). At t = 1000 we 
stop growth but continue evolution by removing a random node after each injection. As a increases 
the sequential emergence of communities becomes even more apparent, with Q oscillating about 
a mean value. This occurs for both growing and stationary networks, (c) The distributions of Q 
at random times during the stationary evolutions shown in (b). (d) The average Q after 10,000 
timesteps. Error bars denote ±ls.d. 
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generated by PA. Although PA can generate characteristic power-law degree distributions, 
this only happens in growing networks. Equilibrating PA networks by the mechanism above 
qualitatively changes P (k) in PA networks [2S], and in this respect the nature of the degree 
distribution is a transient effect in PA. 

That CA simultaneously gives rise to both correlated network structure and temporal 
dynamics is interesting and unexpected. To understand and characterize the dynamics of 
CA, we now explore (i) the aging dynamics of individual nodes after they are injected, and 
(ii) the temporally correlated behavior that earlier nodes have upon later nodes. For the 
latter, we fix the size of the CA networks by removing a randomly chosen node alongside 
each new injection, as per Fig. [TJo. 

When a new node is injected into the system, its degree k(t) and clustering c(t) will evolve 
with the time since injection t. This new node may then exert an influence on the time course 
of subsequent nodes. To see this qualitatively, Fig. [2] depicts "space-time" matrices for three 
realizations of CA. In this matrix, each N x 1 column represents the clustering coefficients 
of the network's nodes at that time. Nodes are ordered by age. The oldest node is removed 
and a new node injected such that the time course of c for each node forms a diagonal streak 
across the matrix. Below each matrix we plot a spike train highlighting the appearances of 
high clustering nodes. As a increases the arrivals of high clustering nodes become temporally 
correlated and the clustering coefficients of those nodes decays slower. This means that both 
individual aging effects and temporal correlations are affected by the CA mechanism. 

More quantitatively, by averaging over many realizations, we measure the expected time 
courses c(t) and k(t) for nodes that are injected with c = 1, shown in Fig. [3] These time 
courses exhibit approximate power law decay (growth) in time for c (k). 

To understand the time scaling of c and k, consider the following simple analysis: First, 
Ok/dt = P CA and P CA ~ c(k,A) a ~ A(t) a \k(t)(k(t) - ~ A a k~ 2a . Assuming the 

time evolution of A is approximately constant gives dk/dt ~ k~ 2a or 

k(t) ~ tV(2a+l) ? ^^-2/(20+1^ (3) 

where c(t) follows from c(t) ~ k~ 2 . Thus we predict, if the time evolution of A is negligible, 
power law growth in time for degree with exponent 1/ (2a + 1) and power law decay in time 
for clustering with exponent —2/ (2a + 1). Despite the simplicity of this calculation we find 
good agreement between simulations and the predicted exponents in Eq. ([3]), see Fig. |3} 
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FIG. 2. (Color online) Space-time evolution for fixed-size networks of N = 100 nodes. Each matrix 
element (i,t) represents the clustering Ci(t) of node i at time t. Nodes are indexed from oldest 
(i = 1) to youngest (i = N). At each time step a new node is injected and the oldest node removed 
such that the time course of an individual node forms a diagonal across the matrix. Below each 
matrix is a spike train denoting injections of high-clustering nodes. As a increases, the clustering 
coefficients of individual nodes persist for longer times and that the arrivals of high clustering nodes 
become increasingly temporally correlated. 

Yet, knowing the expected temporal scaling of individual nodes' c(t) and k(t) is insuf- 
ficient to understand the emergence of the network structures that we observe. We also 
need to understand the temporal nature of hot spot injection times. Thus we turn to the 
time series of triangle injections, or the times when nodes are introduced with c = 1. (For 
m > 2, one can consider the times when new nodes appear with c > 0.) These correspond 
to the injections of high-clustering nodes in Fig. [2j 

If a system displays no memory such that the probability for a spike during any time 
interval (t,t + St) depends only on 5t, then the triangle injections form a poisson process and 
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FIG. 3. (Color online) Expected time courses of (a) clustering and (b) degree as a function of 
time since injection t for growing networks with different attachment exponents a. Straight lines 
correspond to predictions c(t) ~ t~ 2 /( 2a + 1 ) and k(t) ~ t 1 /( 2a + 1 ). We observe the same scaling 
for growing and stationary systems, though the later additionally feature a system-size-dependent 
exponential cutoff. 

the interevent time, or the waiting time between spikes, follows an exponential distribution. 
Yet many systems do not follow poisson processes. Indeed, much effort has gone into study- 
ing the bursty temporal features of human dynamics [Til IT3] . A phenomena is considered 
bursty when it possesses a memory, i.e., the probability for a new event decays with the 
time since the last event giving rise to a non-exponential interevent time distribution. 

In Fig. [4£i we study the interevent time distribution for triangle injections during CA 
network evolution. (As mentioned before, to ensure the system is stationary, for the temporal 
dynamics in Fig. [4] we now fix the size of the network by removing one node at each time step 
as well.) When a = there is no memory and the distribution is exponential, as expected. 
As a grows however, the interevent time distribution becomes more and more heavy-tailed, 
indicating increased probability for a triangle to form soon after a previous triangle was 
introduced. 

A clear way to study bursty dynamics is through the hazard function h(t) = P(t)/Q(t) 
where P(t) and Q(t) are the probability and cumulative distributions of waiting time t, 
respectively. The hazard function can be interpreted as the probability rate for a new 
spike to occur t timesteps following the previous spike, given that no spikes occur in the 
intervening time interval. We measure the hazard functions in Fig. (4Jd. For a poisson 
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process h(t) is constant. Increasing a gives increasingly non-poissonian hazard functions: 
the CA mechanism naturally incorporates bursty time dynamics in the sequences of triangle 
injections. 

In general, the interevent time distribution in many systems, including bursty systems, is 
well described by the Weibull distribution P(t) = h(t) exp [— (t/X) K ], with parameters 
k and A and hazard function 



When k=1, Eq. (|4]) corresponds to a poisson process. 

We now unify the bursty time dynamics for triangle formation with the aging time courses 
for node clustering [Eq. (SI)]. For an active system in equilibrium the density of spikes pit) 
at time t should become approximately constant (i.e., independent of time) such that the 
expected number of spikes emitted in a time interval (t, t + At) ~ At. (This is not the same 
as a poisson process, as the expectation is over an ensemble of CA realizations.) Suppose 
a spike occurred at some past time r < t (without loss of generality we shift time so that 
t = 0). Then, assuming spikes are rare, a point we will return to, we approximate the spike 
density at t by 



In other words, a spike occurs at t depending on the probability for the most recent preceding 
spike to occur at s (which is itself governed by the hazard function for the spike at 0) weighted 
by the clustering at time t. 

Given Eq. (|5]), what hazard function will give rise to a constant p? If hit) = const we 
have 



where (3 = 2/ (2a + 1) from Eq. ^ and the second relation follows by introducing a constant 
A to ensure the initial condition c(0) = 1 and the integral does not diverge. When /3 > 1, 
p(t) —> const as t — > oo, and thus we expect an equilibrium system to be a poisson process 
for a < 1/2. 

When (3 < 1, however, no poisson process can be in equilibrium for our expected c(t). 
Instead, a time-dependent hazard function hit) ~ t K_1 (k ^ 1) is necessary: 




(4) 




(5) 




(6) 




(7) 
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FIG. 4. (Color online) Bursty temporal features of CA. (a) The interevent time distribution. Solid 
lines represent fitted Weibull distributions, (b) The measured hazard functions and h(t) ~ 
When a = we recover the constant h(t) (n = 1) corresponding to a poisson process. (Inset) The 
observed relationship between a and the fitted k. The solid line is the prediction k = 2/(2a + 1) 
of Eq. m. 



where the latter holds when (3 < 1. Therefore the system will be in equilibrium when k — /3. 

As we mentioned, Eq. (|5| is most valid at low spike densities, where the typical time 
between spikes is much greater than the typical time it takes for c(t) to decay. For higher 
densities, the probability for a new spike to occur at time t will depend upon a superposition 
of earlier spikes. Yet the contributions of the earlier spikes will each be time-independent 
when k = f3. Thus our derivation should hold even at higher spike densities. 

In summary, if the above arguments hold we expect an equilibrium system to exhibit a 
hazard function h(t) ~ t K_1 with 



K 



if a < 1/2, 
2/(2« + l) if a > 1/2. 



(8) 



Indeed, there is good evidence for this relationship in the inset of Fig. HI 



DISCUSSION 



There is much room for modeling network growth besides the traditional degree-based 
preferential attachment. A simple twist on this seminal work is to form attachments based 
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on the clustering coefficient. Doing so naturally creates a negative feedback mechanism 
which leads to aging, burstiness, and the formation of community structure in networks. 
The simplicity and robustness of this mechanism is encouraging and may serve as a starting 
point for investigating the origin of higher-order structures in growing networks as well as 
evolving network that are in equilibrium. The emergence of communities and highly variable 
temporal behavior observed in many complex networks, social networks in particular, can 
be investigated from a CA perspective. Our results predict, that if a nodes attractively is 
determined by the local density of connections, higher order network structures are a natural 
and generic consequence. Based on our results, it may be promising to investigate systems 
in which attachment propensities are determined by other centrality measures that capture 
a different aspect of local network properties. 
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